Skip to content

Update shapefile.py #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

FourSpaces
Copy link

add __cuttingStr function , Cut non-English UTF-8 strings, convert to the specified length of the byte string. to ensure the integrity of each UTF-8 characters, will not be cut off half a character encoding.

example:
w = shapefile1.Writer()
w.shapeType = 1
w.autoBalance = 1
w.field('TEXT', 'C', size=10)
w.field('SHORT_TEXT', 'C', size=10)
w.point(121.45291, 31.27055)
w.record('Hello1', '中国的汉字1')
w.record('Hello1', '中国的汉字2')

add  __cuttingStr function , Cut non-English UTF-8 strings, convert to the specified length of the byte string. to ensure the integrity of each UTF-8 characters, will not be cut off half a character encoding.

example:
w = shapefile1.Writer()
w.shapeType = 1
w.autoBalance = 1
w.field('TEXT', 'C', size=10)
w.field('SHORT_TEXT', 'C', size=10)
w.point(121.45291, 31.27055)
w.record('Hello1', '中国的汉字1') 
w.record('Hello1', '中国的汉字2')
@karimbahgat
Copy link
Collaborator

This is a good point. I guess currently there is a danger of cutting short and invalidating any utf8 characters consisting of multiple bytes at the end of a string when truncating text values.

So I really welcome this addition, but would make two small requests for changes:

  1. I see this is based v1.2, and not the more recent 2.0. In the most recent version it first sends the value off to be encoded in b(), and I wonder if this could be implemented inside that function, given an optional 'size' arg of how many bytes to truncate the text to. That would keep things grouped and avoid creating the new special method.
  2. Add it without all the formatting edits throughout the script. Although great, it makes it difficult to see what pertains to the utf8 handling, and not sure if any errors might have crept in there. Better as a separate PR.

Hope you can resubmit this for v2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants